--------------------------------------------------------------------------------
README for: TWIN BIRTH AND MATERNAL CONDITION 
The Review of Economics and Statistics
By: Sonia Bhalotra and Damian Clarke
Questions: damian.clarke@usach.cl

yyyy-mm-dd: 2018-08-01
Version 2: Uploading zip files to maintain directory structure
--------------------------------------------------------------------------------

>>> REPLICATION INSTRUCTIONS

This folder contains all publicly available data and all source code used to gen
erate tables and results for the paper "Twin Birth and Maternal Condition".  All
files ending in .do should be executed in Stata (version 11.0 or higher) and all
files ending in .py should be executed in Python (version 2.xx). The programs in
Python are only used for some table formatting for the final tables, and to prod
uce two appendix figures. As such, all results can be replicated without running
the Python code if desired. The replication material provided here allow for the
Main Tables as well as Appendix Figures and Tables to be generated.  Below where
any Figure  or Table is labelled as A#, this implies it is an Appendix Figure/Ta
ble, otherwise it is one of the tables from the body of the paper. All main tabl
es are exported to the "results/main" folder, and all appendix figures or tables
are exported to the "results/appendix" folder.  All materials can be replicated
directly from the "source" subfolder without changing any details in the .do or
.py files.  Log files corresponding to each program are provided in the "log"
directory.

As laid out below, a number of datasets require prior application before results
can be replicated.  In each case, the procedure to follow to request data is des
cribed at the end of this README file. When one of the programs listed below req
uire proprietry data which is not bundled with these replication materials, this
is indicated in square brackets next to the program name.  To replicate results 
in the paper, the following files should be used (all located in the "source" fo
lder):

(a) analysisWorld.do
    > Runs regressions for panels A, D and E of Table 2
    > Runs regressions for panel B of Table 4
    > Runs regressions for panels A, D and E of Appendix Tables A1, A2, and A4
    > Runs regressions for Table A3, A5, A6, A7, A10, A12, A14, A15
    > Creates Figures A2, A3, A5.

(b) analysisUSFetalDeaths.do
    > Creates Table 5
    > Creates Appendix Table A11

(c) analysisALSPAC.do [REQUIRES PRIOR REQUEST OF DATA FROM ALSPAC]
    > Runs regressions for panel C of Table 2
    > Runs regressions for panel C of Appendix Tables A1, A2 and A4
    > Exports summary statistics for panel C of Appendix Table A14
    
(d) Sweden.do [REQUIRES PRIOR REQUEST OF DATA FROM SWEDISH MEDICAL REGISTRY]
    > Runs regressions for panel B of table 2
    > Runs regressions for panel B of Appendix Tables A1, A2 and A4
    > Exports summary statistics for panel B of Appendix Table A14

(e) analysisDHS.do
    > Runs regressions for panel A Table 4
    > Runs regressions/analysis for Table 6
    > Creates Figure A1
    > Runs regressions for Table A8

(f) analysisNLSY.do
    > Creates summary statistic Table A16
    > Creates summary plots Figure A7
    > Runs regressions in Table A17

(g) worldPlots.py
    > Takes results output from World.do and plots Appendix Figures A4 and A6
    
(h) worldTables.py
    > Formats Table 2 to appear as one table (uses results from analysisWorld.do)
    > Formats Tables A1, A2, A3, A4, A7 and A10 as per online appendix
    > Formats Appendix Tables 

Of the files described above, (a)-(f) should be run with Stata. Any user-written 
ados required to be installed from the SSC are indicated in each file. For files 
(g)-(h) Python should be used.  File (g) requires that the matplotlib and numpy 
libraries are available.

--------------------------------------------------------------------------------

>>> DATA INSTRUCTIONS

8 principal data sources are used in this paper, and where possible, these are p
rovided in the 'data' subfolder of these replication materials. The data sources
are described below along with the format in which they are provided here, or th
e instructions on how to apply for restricted data in a few cases:

(1) United States Vital Statistics:
    - Natality Microdata, 1999-2002; 2009-2013
    - Fetal Death Microdata, 1999-2002;
    ** These are provided in "data/USA" subfolder, and are provided precisely as
    downloaded from the National Vital Statistics System homepage.  All variable
    names are documented in the National Vital Statistic System manuals provided
    along with the data.  See here: https://www.cdc.gov/nchs/nvss/index.htm

(2) Demographic and Health Surveys
   - All publicly avaialable surveys from 1990-2013
   ** All publicly available DHS surveys are downloaded from the DHS website and
   and are merged and appended to create a file with one line per child matched
   to their mother.  We provide three files here based on DHS data:
     (A) DHS_twins: 1 line per child matched to mother, currently alive
     (B) DHS_twins_IMR: 1 line per child whether or not alive
     (C) TwinsMMR: 1 line per mother's sister and whether she died in childbirth
   If desired, this data can be regenerated form source using a script which auto
   mates the downloading and merging of the DHS data, and is provided on Clarke's
   website on the "computation" page.  Otherwise, all results can be replicated
   from the generated data provided in the data subfolder.

(3) The Chilean Early Life Longitudinal Survey (ELPI)
   - We use the first wave of this survey 
   ** A description of this data is available at the following official website:
   http://www.elpi.cl/
   We provide the full data used in the 'data' subfolder

(4) The Avon Longitudinal Study of Parents and Children (ALSPAC)
  ** This data is restricted, and requires prior application.  The precise name
  of each variable we applied for is provided in the "analysisALSPAC.do" file in
  the "source" subfolder.  We provide full replication materials in this do file
  and full log files showing output, however to replicate results, the data must
  be requested from the ALSPAC study team, who charge a data processing fee. All
  details on data access are available at the following website:
  http://www.bristol.ac.uk/alspac/
  Please contact damian.clarke@usach.cl if any additional information is require
  d, or for a full copy of the application form completed to request this data if
  desired.

(5) Swedish Medical Birth Registry (Socialstyrelsen)
  **The data to replicate the regression of twinning on maternal characteristics
  in Sweden is generated from the Swedish Medical Birth Registry. This data is a
  vailable by application from the Swedish Socialstyrelsen, however must be acce
  ssed from within Sweden on a secure server. The raw data is the universe of li
  ve births from 1990-2011.  The data script which uses this raw data to produce
  the regression in Table 2 panel B is called analysisSweden.do and is available
  in the 'source' folder of these replication materials. This is based on the raw
  data provided from the Socialstyrelsen, and all variables are named as in this
  original data. Log files are provided, as are full formated regression results.
  A full description of the data is available at the following website along with
  details of how to access the data:
  http://www.socialstyrelsen.se/register/halsodataregister/medicinskafodelseregistret

(6) Spanish Microdata on births
   ** Table 3 of the paper replicates Quintana-Domeque and Ródenas-Serrano (2017)'s
   analysis of the impact of a prenatal stress shock on the likelihood of giving bi
   rth to twins.  This is based on the full Spanish microdata on births available
   at the following site:
   http://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736177007&menu=resultados&secc=1254736195443&idp=1254735573002

   A complete description of this data is provided in Quintana-Domeque and Ródenas-
   Serrano (2017):
   https://www.sciencedirect.com/science/article/abs/pii/S0167629617308093

(7) National Longitudinal Survey of Youth (Young Women data)
   ** We provide raw data as exported from the NLSY data portal.  All processing and
   analysis is provided in the analysisNLSY.do file. All variable names are provided
   by the NLSY, and all variables are fully described at the NLS Investigator info
   portal, where data can be downloaded for free (url below).  This data is also pro
   vided in the 'data' subfolder of these replication materials.
   https://www.nlsinfo.org/investigator/pages/search.jsp?s=NLSW
